q21ubfandomcom-20200214-history
21Q:How does Google search engine search whatever I want?
21Q:How does Google search engine search whatever I want? Introduction: There are lot of search engines in our web system. Among all of them Google got the first place. About 69.1% people are using google where 30.9%users are using others such that MSN, Yahoo etc.Page Rank developed by Larry Page and Sergey Brin. It is first as part of a research project about a new kind of search engine.That project started in 1995 and led to a functional prototype in 1998.Page ranking is actually the heart of the Google search engin. So that in the later portion I try to focus on page ranking. Short Answer: ''' Search engines are commonly used to search for information on the Internet. Nowadays, the well-known search engines, such as Google, Yahoo, Wiki, MSN, Bing etc, have provided the us good search results based on our search requirement.High-ranking websites generate the most traffic, thereby creating the largest commercial opportunities Google is unique in its focus on developing the perfect search engine that understands accurately what users mean and gives them back precisely the desired information. It combines ‘PageRank technology’ with complex ‘text-matching techniques’ to ensure its search quality. PageRank Technology is adopted to examine the entire link structure of the Internet and locate the most important pages. This Technology actually measures the importance of web pages by solving an equation of more than 500 million variables and 2 billion terms. '''Long Answer: General Architecture of a Search Engine: • Spider (Crawler)-Its purpose is to gather information. • Indexer-It analyzes information the information of the Spider • Searcher-Rank the information from the spider Different Page Rank Algorithm: 'Simplified Equation:' Although overall ranking is determined by considering many other factors but Google claims that the heart of its search engine software is PageRank. The PageRank algorithm assigns a PageRank score to more than 25 billion web pages on the WWW.' '''During the processing of a query, Google’s search algorithm combines p recomputed PageRank scores with text matching scores to obtain an overall ranking score for each web page. A simplified version of PageRank is defined in equation-1. Where u represents a web page, B(u) is the set of pages that point to u. PR(u) and PR(v) are rank scores of page u and v, respectively. Nv denotes the number of outgoing links of page v, c is a factor used for normalization. In PageRank, the rank score of a page, p, is evenly divided among its outgoing links. The values assigned to the outgoing links of page p are in turn used to calculate the ranks of the pages to which page p is pointing. But the above equation is no longer useful because of the dandling node-most of the time we have great chance not to get an exact PageRank information. Example: After using the simplified equation we find out the the curve shown in the fig-1. Google's Original Page Rank Algorithm: Later PageRank was modified observing that not all users follow the direct links on WWW. The modified version is given in equatuin-2 Where PRori(A) is the Original Google PageRank of page A; PRori(Ti) is the Original Google PageRank of pages Ti that link to page A; C(Ti) is the number of outbound links on page Ti; d is a damping factor which can be set between 0 and 1; n is the total number of all pages that link to page A. Within the Original PageRank algorithm, the PageRank of a page T is constantly weighted by the number of outbound links C(T) on page T. Restated, the more outbound links a page T has,the less page A benefits from a link to it on page T. The additional inbound link for page A increases page A’s PageRank. Finally, the sum of the weighted Page Ranks of all pages Ti is multiplied by a damping factor d, generally set to 0.85 can be thought of as the probability of users’ following the links and could regard (1 − d) as the page rank distribution from non-directly linked pages.' The PageRank theory holds that even an imaginary surfer who is randomly clicking on links will eventually stop clicking. The probability, at any step, that the person will continue is a damping factor d. Example: Afterusing taking different damping factor value d (from 0.2 to 1) we get the figure-2 which shows that 0.85 is the perfect value which google is using. At this value the curves converges early. Google's Second version Page Rank Algorithm: In the second version of the algorithm, the PageRank of page A is given as in equation-3 where in equation-3: PRsec(A) is the second version PageRank of page A; PRsec(Ti) is the second version PageRank of pages Ti that link to page A; C(Ti) is the number of outbound links on page Ti. N is the total number of all pages on the web. As mentioned above, the two versions of the algorithm do not differ fundamentally from each other. The second algorithm merely adapts (1-d)/N to replace (1-d) as for the Random Surfer Model, the second version PageRank of a page denotes the real probability of a random surfer reaching that page after clicking on many links. The PageRanks form a probability distribution over web pages, explaining why the sum of PageRanks of all pages is 1. example: Using eaution-2 and euation-3 we plot the resulting curve which shows that the Google's Second version method- page ranking converges early than the Google's Original Method. '''Using Link Types in Web Page Ranking and Filtering: There are also some other methods of page ranking. Such that page ranking can be propagational through links and the propagational rates depend on''' the '''types of the links and users’ specific set of interests. Page filtering can be decided based on link types combined with some other information relevant to links. For either a ranking or filtering task, a profile containing a set of ranking or filtering rules to be followed inthe task can be specijied to reflect users ’specific interests. Formally semantic link ' 'βϵᴧ 'consists of the following components 'β= β, Snβ,dnβ,Aβ,αβ> tβ ϵ Tᴧ '''is the link type '''Snβ ϵ N '''denotes the source node of the link β '''dnβ ϵ N 'denotes the destination node of the link β '''Aβ ' ={a1β,a2β,……akβ} Aβ is the subset of Aᴧ which is a set of structed link attributes 'αβ ϵ ∑ '''is a free text annotation to the link. ' 'Additional Factors Influencing PageRank:' • Visibility of a link • Position of a link within a document • Distance between web pages • Importance of a linking page • Up-to-datedness of a linking page ' 'Effect of additional Inbound Link: (1) Inbound link is one of the most important factor of successful search engine optimization. (2) It helps to get attention of majority of users. (3) It helps to increase traffic, page rank and position of a specific web page. (4) If inbound link has specific key word Google always give them more value (5) It increases marketing opportunity. So adding more inbound link can raise the page rank of a web page. Effect of additional Outbound Link: ' '(1) The building of outbound links is also valuable for building relationships. (2) Additional link may bring additional traffic to that site. The site owner might eventually take notice on what is going on that site. (3) It makes aware of other business community to any specific webpage. (4) It could open the door for joint ventures and other marketing opportunities. (5) Outbound link can raise credibility of visitors of any specific site So adding more inbound link can raise the page rank of a web page. Conclusion: Google is still trying to improve quality of Page Ranking to give us better search Engin results. SEO( Search Engin Optimer of Serch Engin Optimization) also working hard for google. We all expect that we could get more better result by using google. References: 1“Damping factor in Google page ranking”Hwai-Hui Fu*,y, Dennis K. J. Lin and Hsien-TangTsai Department of Business Administration, Shu-Te University, 59 Hun Shan Road, Yen Chau, Kaohsiung 82445, Taiwan, 2006 IEEE 2”Using Link Types in Web Page Ranking and Filtering”Zhanzi Qiu, Matthias Hemmje, Erich J. Neuhold.' 0-7695-1393-WO2 $17.00 '0 '2002 IEEE' '3”'Page Ranking Based on Number of Visits of Links of Web Page” Gyanendra Kumar1, Neelam Duhan2, A. K. Sharma. International Conference on Computer & Communication Technology (ICCCT)-2011 ' ' ' ' ' ' ' '